Model Selection

ImageNet fine-tuning

# ImageNet fine-tuning

Convnextv2 Tiny.fcmae

A self-supervised feature representation model based on ConvNeXt-V2, pre-trained using the Fully Convolutional Masked Autoencoder (FCMAE) framework, suitable for image feature extraction and fine-tuning tasks.

Image Classification

Data2vec Vision Base Ft1k

Data2Vec-Vision is a self-supervised learning model based on the BEiT architecture, fine-tuned on the ImageNet-1k dataset, suitable for image classification tasks.

Image Classification

Data2vec Vision Large Ft1k

Data2Vec-Vision is a self-supervised learning vision model based on the BEiT architecture, fine-tuned on the ImageNet-1k dataset, suitable for image classification tasks.

Image Classification

Regnet Y 1280 Seer In1k

RegNet image classification model trained on ImageNet-1k using self-supervised pretraining and fine-tuning methods

Image Classification

Regnet Y 640 Seer In1k

RegNet model trained on imagenet-1k, pre-trained in a self-supervised manner on billions of random web images before fine-tuning

Image Classification

Vit Large Patch32 384

This Vision Transformer (ViT) model is pre-trained on the ImageNet-21k dataset and then fine-tuned on the ImageNet dataset, suitable for image classification tasks.

Image Classification

Vit Base Patch32 384

Vision Transformer (ViT) is an image classification model based on the Transformer architecture, achieving efficient image recognition capabilities through pre-training and fine-tuning on the ImageNet-21k and ImageNet datasets.

Image Classification

Beit Large Patch16 512

BEiT is a vision Transformer-based image classification model, pre-trained in a self-supervised manner on ImageNet-21k and fine-tuned on ImageNet-1k.

Image Classification

Beit Base Patch16 224

BEiT is a vision model based on image transformers, employing a BERT-like self-supervised pre-training method. It is first pre-trained and fine-tuned on ImageNet-22k, then further fine-tuned on ImageNet-1k.

Image Classification

Vit Base Patch16 384

Vision Transformer (ViT) is an image classification model based on the Transformer architecture, pre-trained on ImageNet-21k and fine-tuned on ImageNet.

Image Classification

Vit Large Patch16 384

Vision Transformer (ViT) is an image classification model based on the transformer architecture, pre-trained on ImageNet-21k and fine-tuned on ImageNet.

Image Classification

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase